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[57] ABSTRACT 

A highly parallel cyclic redundancy code generator has p 
precalculated k-bit remainder polynomials loaded in a 
lookup table. A cyclic redundancy code register has a p-bit 
portion and a k-bit portion. An input data message is input 
to an input XOR gate together with the contents of the p-bit 
portion to generate a p-bit result for storage in the p-bit 
portion. The content of the p-bit portion is used to control 
which k-bit remainder polynomials from the lookup table 
are to be parallel XOR'd to produce a partial cyclic redun- 
dancy code that is stored in the k-bit portion. The contents 
of the CRC register are shifted to the left and the process 
repeated until ail of the bits of the input data message have 
been processed. The contents of the k-bit portion are then 
output as the CRC for the input data message. 

5 Claims, 2 Drawing Sheets 
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HIGHLY PARALLEL CYCLIC CRC outputs. Prior to each input XOR operation the con- 

REDUNDANCY CODE GENERATOR tents of the CRC register are shifted p bits to the left. This 

process is iterative until the entire data message has been 

BACKGROUND OF THE INVENTION processed, with the k-bit portion being output as the final 

The present invention relates to checking the integrity of ^ CRC for the data message, 

digital data, and more particularly to a highly parallel cyclic The objects, advantages and other novel features of the 

redundancy code (CRQ generator that produces partial present invention are apparent from the following detailed 

CRCs in parallel to any degree to greatly impact each and description when read in conjunction with the appended 

every high-speed digital networicing application, claims and attached drawing. 



The cyclic redundancy code (CRC) has been used for a 



BRIEF DESCRIPTION OF THE DRAWING 



long time as a means to preserve the integrity of digital data 

for storage and transmission. Treating the data, or message, pic i is a block diagrammatic view of a portion of a 

as a binary polynomial u(x), its CRC which corresponds to highly paraUel CRC generator illustrating a parallel XOR 

a particular generator polynomial g(x) may be generated by tree circuit according to the present invention, assuming 

first raising the message polynomial to a proper power and p^j^^ 

then uddng the remainder t(x) of the message polynomial ^ ^ ^^^^ diagrammatic view of a highly parallel 

divided by the generator polynomial Early CRC imp emen- ^RC generator according to the present invention, assuming 

tations made use of the concept of hnear feedback shift n^k 
registers (LFSR) in which polynomial division is processed 20 

one bit at a time. DESCRIPTION OF THE PREFERRED 

As the technology advanced single -bit CRC generation EMBODIMENT 
using LFSRs was not enough to handle high-speed data 

processing and transmission. Parallel CRC algorithms were Let u(x) be a message polynomial of degree n-1, i.e., with 

then developed to meet this need. Most of these algorithms n t^its; g(x) be a generator polynomial of degree k; p be the 

generate CRCs in software or in firmware, and some have number of bits to be processed in parallel, usuaUy greater 

been implemented in special hardware to take advantage of than k; and Rj;a(x)] be the remainder of a(x) divided by 

very large scale integrated (VLSI) circuit technology. These SW- From these definitions Rj;a(x)] is a polynomial of 

parallel CRC algorithms, although improved over single-bit degree k-1. The process of generating CRCs for u(x) is to 
LFSR, are not highly parallel in the sense that they can 3^ find the polynomial r(x) of degree k-1 such that: 
process at most one or two bytes at a time, limited by the 
degree of the generator polynomial. Byte- wise CRC gen- 

e ration is insufiScient for very high-speed protocol process- ^ wi-r(x). 

ing in gigabit-per-second ATM/SONET environments. Con- ^^^^ ^ ^^.^^^ ^^^jj^^^ polynomial q(x) satisfying the 

sidering the case where the internal data path of a host following equality: 
processor is 64-bit, it is highly desirable to perform 64-bit 
CRC generation even though the degree of the generator 

polynomial may only be 16 or 32. Existing CRC algorithms x*«(jf)-^x)^(x)+r(x) 
are cumbersome in this situation. 

The key reason that existing CRC algorithms are limited 40 For binary polynomials the following are true: 

in their degree of parallelism is deeply rooted in the concept i) RJ^x^+x'^l-R^x^J+RJ^x'^; 

of LFSRs. All existing algorithms try to solve the same ii) R Jx'**^]«Rix'Tlj;x'n]. 

problem, i.e., how to parallelize the bit-by-bit operation of jhe ad^tion of bmary polynomials is performed in the sense 

LFSRs. As a result the degree of parallelism never goes Qf niodulo-2. By definition: 
beyond the perceived size of LFSRs. 45 

What is desired is an improved, highly parallel CRC 

algorithm which can generate partial CRCs in parallel to any Jc*«(jf)=Jc*(jf"~'««-i+Jc„-2«»-2+ • • • +^"i+jc°«q) 
degree. 

Define l=[n/p], i.e., the largest integer smaller than or equal 

SUMMARY OF THE INVENTION to n/p, and rearrange x*u(x) to be: 

Accordingly the present invention provides a highly par- 
allel CRC generator that produces partial CRCs in paraUel to /«<x)-/(x---«(,,(:r>:t"-^«(,/xH . • . +x-^-<'-^>u,^,j(x). 
any degree. Remainders of the form RJ.x'], k^i^k+p-1, for ^^^^^(x)) 
a k+p-1 degree polynomial are precoraputed and loaded into 

a lookup table. A CRC register, having a p-bit portion and a 55 where U(j)(x) is a (p-l)-degree polynomial containing the 
k-bit portion, is initialized to all zeros. A data message signal first p coefficients of u(x), u^2)(x) is a (p-l)-degree polyno- 
is input to an exclusive OR gate where it is XOR*d p bits at mial containing the next p coefficients of u(x), etc., and 
a time with the p-bit portion of the CRC register. The result ^^o-p^ contains the remaining terms of u(x) from 
is saved back into the p-bit portion of the CRC register. The ^-\-i\-\^p^\ ^ 

p-bit portion of the CRC register is used to control which 60 Using the two facts of binary polynomials: 
remainders firom the lookup table should be XOR'd via a 

parallel XOR tree. When p^k, the result ofthe parallel XOR - /fJjc*ie,t«W]] 

operation is stored into the k-bit portion of the CRC register. 

When p<k, the result of the paraUel XOR operation is = /?Jat*(/?ix-i-p«o)(x)] + + . . . + 

XOR*d with the k-bit portion of the CRC register, and the 65 ^Jx"-M/-i)Ptt(;.ij(x)] + «J«(d)(x)D] 

result is saved back to the k-bit portion of the CRC register. 
The k-bit portion of the CRC register provides the partial 
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or equivalently in an iterative form: 

1*1 I l-l /-2 3 2 t 1 

"(2)W] + «(3)W] + ...] + + U(0)(jr)] ] 

2 3 /-2 /-I / M 

The indices under the brackets are used to identify bracket 
pairs. By moving the term x* inside the equation becomes: 

I l-l /-2 3 2 1 I 

J^(2)W] + JC*«(3)W] + ■ • ] + ^(l-l)W] + JC*«(0()(X)] 
2 3 t~2 i-i I 

This equation provides a method of computing the CRC of 
u(x) in an iterative fashion. Starting &om the innermost 
calculation at every iteration only the computation of the 
remainder of the sum of, first, the remainder from the last 
iteration multiplied by , and, second, a polynomial x*U(,-j 
(x) of degree k+p-1 is needed. The term x"'^"^"^^ raises the 
partial CRC to a proper power before summing with x*U(o) 
(x) for final CRC generation. 

The parallel generation of the remainder of a polynomial 
of degree k+p-1 is achieved by first pre -calculating the 
remainders RJ^x*], k^i^k+p-1, to form a lookup table. 
Then at every iteration for those x* terms with nonzero 
coefl&cients their RJx*] remainders are XOR'd together to 
obtain a partial CRC. By writing the expansion of RJ[x*u(x)] 
in different iterative forms, various parallel CRC generators 
may be realized. One possible CRC generation procedure is 
described below based on the last equation. 



/*[mtialization*/ 

1. Pre-compute lookup table entries Rglx'l k ^ i s k+p-1; 

2. Initialize a register CRC[k+p-l ... 0] to zeros; 
/•Main loop*/ 

3. While (there are more than p bits to be processed) do { 

4. Shift CRC[k-l ... 0] p bits to the left; 

5. Input the next p bits from u(x) XOR them with CRC[k+p- 
1. . .k], and save the result in CRC[k+p-l. . JcJ 

6. Look up the table for R^x% k S i s k+p-1, XOR CRCtk-1 . 
. . 0] with CRCliJ'Rglx'], k S t i k+p-1, and save the result 
in CRCfk-l ... 0]; 

} 

/♦Finish up*/ 

7. Let m be the number of bits yet to be processed, shift CRC[k-l . 
. . 0] m bits to the left; 

8. XOR the last m bits from u(x) with CRC^k + m-1. . .k] and save 
the result in CRCtk + m-l...kt 

9. Look up the table for RJx^l k ^ i ^ k+m-1, k+m-1, XOR 
CRC[k-l ... 0] with CRC(i]*Rglxa k s i s k+m-1, 

and save the result in CRC[k-l ... 0]. 



In implementing the above procedure in hardware the key 
XOR operations in steps 6 and 9 are performed by a parallel 
XOR tree, as shown in FIG. 1, A lookup table 10 has p 
locations of k-bits each, each location containing a remain- 
der value of R Jx*], where k^i^k+p-1. The remainders are 
output in parallel to a plurality of gates 12 which in turn are 
controlled by the p most significant bits of a CRC register. 
The outputs from the gates 12 are combined in a parallel 
XOR tree circuit 14 that has a plurality of two-bit XOR gates 
16 in a tree configuration. The output from the last two-bit 
XOR gate 16 in the tree is the partial CRC data for the p bits 
of the message. This procedure does not perform table 
lookup because each RJ^x*] is always fetched with respect to 
the same bit position of the CRC register. This is advanta- 
geous for high-speed operation. The storage requirement of 
the lookup table is only kxp bits, one of the smallest CRC 
tables in use. 



'8,057 
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A complete CRC circuit 20 is shown in FIG. 2. A 
microprocessor (not shown) precompuies the remainders 
that are loaded into the lookup table 10. A CRC register 22 
is divided into two parts, a p-bit portion 22P and a k-bit 

5 section 22K. A control circuit 24 controls the CRC register ^ 
22, communicates with the microprocessor, and provides • 
appropriate timing signals. An input XOR gate 26 processes - 
a data message p bits at a time. The other inputs to the input 
XOR gate 26 are the p most significant bits k through p+k-1 

10 from the p-bit portion 22P. The output of the input XOR gate 
26 writes the output back into the p-bit portion 22P. The p-bit 
portion 22? also is applied bit by bit in parallel as a control 
signal to respective output gates 12 to determine which 
remainders from the lookup table 10 are input to the parallel 

15 XOR tree circuit 14. The k-bit output from the parallel XOR 
tree circuit 14 is loaded into the least significant bits 0 
through k-1 of the k-bit portion 22K. The output from the 
k-bit portion 22K is the partial CRC. 

When applying this hardware to generate 8 -bit CRCs over 

20 the first four bytes of an Asynchronous Transfer Mode 
(ATM) header, only one pass is sufladenl to generate the 
final CRC. This offers tremendous time saving over existing 
CRC algorithms, for it would take them four passes, one- 
byte partial CRC per pass, for generating the final CRC since 

25 the generator polynomial is of degree 8 in this case. 

In addition to its inherently high parallelism the present 
invention is greatly universal, in that it is applicable to any 
value of n, p and k as well as any generator polynomial. 
Corresponding to each generator polynomial a different set 

30 of pre-computed RJ^x']s is used. Because for each generator 
polynomial only kxp bits are needed for storing its corre- 
sponding R Jx*] set, a small amount of memory sufiBces to 
house RJx*] sets for all commonly used generator polyno- 
mials. Thus a general purpose CRC processor may be built 

35 around the suggested parallel XOR tree architecture. The 
value of k varies with g(x), but the value of p is fixed in a 
hardware XOR tree. 

CRC calculation of many protocol headers fall into the 
special case where n-p. In this case only the finish-up 

40 portion of the above pseudo code is executed since n«-m"p. 
The pseudo code may be simplified by using a k-bit CRC 
register and performing the single line operation as follows: 

XOR u^'Rj;x'**l, for Oii^n-1, and save the result in CRCtk-1 . 
..01 

This is the operation to be used for the ATM cell header error 
correction (HEC). With such a one-pass CRC syndrome 
generation capability this invention offers very high-speed 

50 protocol header processing. 

When n is a multiple of p eventually m equals p. In this 
case the CRC register needs to support only p-bit shift. A 
register with hardwired fixed-length shift offers significant 
improvement in speed over linear shift registers. Most 

55 protocol data unit (PDU) definitions fall into this category 
for they are 64- or 32-bit aligned. The ATM cell payload 
which has a fixed length of 48-bytes also belongs to this 
category. Using a 64-bit version of this invention the CRC 
for a cell payload is computed in six passes. 

60 Another nice feature of this invention is that the length of 
a message is not required to start the CRC computation. As 
seen from the pseudo code the finish-up part of the code 
adjusts properly the partial CRC for final CRC computation. 
It is thus possible to generate a partial CRC with a partially 

65 available message. This is useful in computing CRC over 
compressed, variable-bit-rate video, since the compressed 
video may be generated on the fly and the user may not have 
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control over when and where a video frame or field termi- 
nates. Splitting the message polynomial u(x) into two dis- 
joint polynomials, u*(x) and u"(x), from the basic properties 
of binary polynomials the following is true: 

This equation has different interpretations depending upon 
the selection oit u*(x) and u'Xx). If u'(x) is a fixed-length but 
originally unknown portion of the message and u"(x) is the 
rest of the message, RJ;x^l"(x)] may be computed first and 
added to RJ^x*u'(x)] when available to generate the final 
CRC. This is the case in an IP router where the packet 
payload is available but the destination IP address is yet to 
be resolved, or in a multiprotocol environment where the 
data payload is fixed but the packet header is updated due to 
protocol conversion, even though different protocols make 
use of the same generator polynomial. 

When u'(x) represents the odd words and u"(x) the even 
words of u(x), two 32-bit CRC circuits may be incorporated 
for CRC computation for a 64-bit data path. In order to 
support this concurrent operation the p value in the pseudo 
code is replaced by 2p and R^x'] is pre-computed for 
k^i^k+2p-l. The finish-up portion of the pseudo code is 
properly modified so as to combine the partial CRC of the 
two CRC circuits for generating the final CRC. Similarly 
four 16-bit CRC circuits may be used for the 64-bit path. The 
advantage of using duplicate CRC circuits is to leverage 
existing CRC parts for high-speed CRC computation. By 
employing enough CRC circuits in parallel or pipelining or 
both, the CRC may be computed as fast as the data can be 
moved. There is however a short latency spent for CRC 
computation start-up and finish-up. 

Another extension of the present invention is that when a 
small part of the message is intentionally altered, the CRC 
may be updated without full-blown recompulation. Accord- 
ing to the properties of binary polynomials: 

/?^x*K'(Jc)+Jt*«"W] - /?/:r*w'(x) + jr*c(x)+jr*c(jc)+j(*M-(x)] 

- Rlx^u'ix) + jc*c(x)] + R^x*c{x) + J(*H-(JC)]. 

Rearranging produces: 

Rjix*cix)+x^i "(xyhRjix^Xxyx'cix)] 

where uXx) is the section of the message to be replaced, 
u"(x) is the rest of the message, and c(x) is the new test to 
replace u'(x). The above equation indicates that the CRC of 
the new message, c(x)+u"(x), may be obtained by adding the 
existing CRC with Rj;x*u'(x)+x*c(x)]. This is useful where 
a packet header has to be modified from hop to hop, or in the 
situation of multiprotocol conversion in which only the fixed 
size header and/or tail of a message is updated and the body 
of the data payload is left unaltered. If the length of the 
packet is foreknown or fixed, the RJ^x'] set that corresponds 
to u'(x) may be pre-computed. The generation of RJ|;?u'(x) 
+x*c(x)] is achieved by simply XOR-ing the corresponding 
R^x"] tenns according to the sum of u'(x) and c(x). 

CRC specifications \isually involve extra one's comple- 
ment operations, and the present invention is applicable in 
those cases also. As an example using the ANSI CRC-32 
specification employed by the IEEE 802 -series networks, 
including Ethernet, FDDl, token ring and token bus, the 
CRC-32 generator polynomial is: 



10 



The CRC-32 specification may be written as: 



Rjix^z(x)^j^Mx)h<x) 

where z(x)=x^^+x^°+ . . . +x+l. Since n is the size of the 
message polynomial, adding x"z(x) to x^^(x) has the effect 
of inverting the first 32-bit of u(x), whereas adding z(x) 
yields the one's complement of R_[x"z(x)+x^^(x)]. From 
this the CRC-32 for u(x)=»u'(x)+u"(x) may be derived as: 



Rjix^z^x) + x^'H'(r) + x'^V)] + z{x) - 

R,[x"r(x) + x'Mx) + x^^c(r) + x'^cix) + x'^u'ix)] + = 
j5 Rjix^'uXx) + x^'cix)] + {Rjix^zix) + x'^dx) + x'^u'ix)] + z(r)}. 

Rearranging produces: 



20 



RJ[yzix)+x'^c{x)+J^^u •'(x)K2(x)- 
{/?,[j"z(j)+-c^^«'('>fjr'^M''W}+r(j)}+>?,[;c^2H'(.t>+x'^c(x)] 



Thus immaterial to the extra operations of x"z(x) and z(x), 
the CRC-32 syndrome for the new message c(x)+u"(x) is 
obtained by adding the existing CRC with Rj;x^^'(x)+jr'^c 

25 (X)]. 

The alteration of the message above is performed bit by 
bit. However there are instances where bits are inserted or 
deleted, thus yielding expansion or shrinkage of the message 
size. It is possible to expand the header and/or tail of a 

30 message according to the present invention without recom- 
puting the CRC over the message body. Expanding the 
header is mathematically equivalent of adding new high- 
order coefiScients to the message polynomial, and thus only 
the remainders corresponding to these new coefficients need 

35 to be added to the existing result for generating the final 
CRC. Adding new bits at the tail of a message is similar to 
the processing of the tail of a message whose size is 
unknown up front. The header or tail expansion property of 
the present invention is useful in appending digital signature 

40 to a CRC-protected message or concatenating two CRC- 
protected messages into a single one. In the latter case the 
CRC of u(x) concatenated with v(x) is: 



45 Rjix'{^RM^)V^M')y^^i^^AMx)hR^x^^^ 

Rjij^RJ[x'u{x)]}^RJix^x)] 

where q is the size of v(x). Pre-computing RJ^x'] for 
q^i^q+k-1 and storing RJx*u(x)] in the k-bit CRC 

50 register, the last equation indicates the final CRC of the 
concatenated message is the XOR-ing of R Jx*v(X)] and 
CRqi]*Rj;x^'], O^i^k-1. This is obtained by the parallel 
XOR tree 14 in a single pass, given that the proper R/x'] 
terms are pre-computed. Since in practice it is not possible 

55 to store every possible R^x*], we may selectively store a 
number of R J??] sets where each set has p RJ^x'] terms and 
the leading x terms of these sets are separated by the power 
of x^'', x'*^, x*^, and so on. The purpose is to trade off some 
memory space for facilitating CRC computation of concat- 

60 enated messages. 

Thus the present invention provides a method and appa- 
ratus for highly parallel CRC computations that inherently 
has a high degree of parallelism in hardware to provide 
realtime calculations in the very high speed digital data era. 

65 What is claimed is: 

1. A highly parallel cyclic redundancy code generator 
comprising: 
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means for storing p predeteroained remainder polynomials * 
of k bits each, where p is the number of bits lo be 
processed in parallel and k is the degree of the poly- - 
nomial; 

a register having a p-bit portion and a k-bit portion, the • 5 
k-bit portion providing an output partial cyclic redun- 
dancy code; 

means for combining the contents of the p-bit portion with • 
an input data message for storage in the p-bit portion; 
and 

means for parallel XORing the remainder polynomials 
from the storing means according to the contents of the 
p-bit portion to produce the output partial cyclic redun- 
dancy code for storage in the k-bit portion. 

2. The highly parallel cyclic redundancy code generator as ^ 
recited in claim 1 wherein the storing means comprises a 
lookup table having the p predetermined remainder polyno- . 
mials loaded therein to provide p parallel outputs for input 
to the parallel XORing means. 

3. The highly parallel cychc redundancy code generator as 
recited in claim 1 wherein the combining means comprises 
an XOR circuit having p bits in parallel of the input data 
message as one input and the p bits from the p-bit portion as 
a second input, the output of the XOR circuit being coupled 
to store the result of the XOR operation into the p-bit 
portion. 

4. The highly parallel cyclic redundancy code generator as • 
recited in claim 1 wherein the parallel XORing means 
comprises: 3q 

p gates coupled one to each output of the storing means ^ 
to receive the p predetermined remainder polynomials 
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in parallel, each p gate being controlled by a respective 
bit of the p-bit portion; and 

a parallel XOR tree having the outputs from the p gates as 
inputs to XOR in parallel those outputs passed by the 
p-bit portion, and having an output coupled to the k-bit 
portion for storing the results of the parallel XOR- 
operation as the output partial cyclic redundancy code. 

5. A method of generating a highly parallel cyclic redun- 
dancy code comprising the steps of: 

initially storing p predetermined k-bit remainder polyno- 
mials in a lookup table, where p is the number of bits 
to be processed in parallel and k is the degree of the 
polynomial; 

initially setting a cyclic redundancy code register, having 
a p-bit portion and a k-bit portion, to zero; 

shifting the contents of the cyclic redundancy code reg- 
ister to the left; 

XORing the contents of the p-bit portion with an input 
data message for storage in the p-bit portion; 

parallel XORing the p k-bit remainder polynomials 
according to the contents of the p-bit portion to gen- 
erate an output k-bit partial cyclic redundancy code for 
storage in the k-bit portion; and 

repeating the shifting, XORing and parallel XORing steps 
until all of the bits of the input data message have been 
processed, the resulting contents of the k-bit portion 
being the cyclic redundancy code for the input data 
message. 

* * * * 
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DEVELOPMENT 

This invention was made with Government support under Agreement 
No. N00014-94-C-2168 awarded by Department of Navy. The Government 
has certain rights in the invention.* 



Signed and Sealed this 
Third Day of April, 2001 



Attest: 




NICHOLAS P.GODICI 



Attesting Officer 



Actittft Director of the United States Patent and Trademark Office 



08/15/2003, EAST version: 1.04.0000 



